Search CORE

5 research outputs found

Spectrum Bandit Optimization

Author: Lelarge Marc
Proutiere Alexandre
Talebi M. Sadegh
Publication venue
Publication date: 17/02/2015
Field of study

We consider the problem of allocating radio channels to links in a wireless network. Links interact through interference, modelled as a conflict graph (i.e., two interfering links cannot be simultaneously active on the same channel). We aim at identifying the channel allocation maximizing the total network throughput over a finite time horizon. Should we know the average radio conditions on each channel and on each link, an optimal allocation would be obtained by solving an Integer Linear Program (ILP). When radio conditions are unknown a priori, we look for a sequential channel allocation policy that converges to the optimal allocation while minimizing on the way the throughput loss or {\it regret} due to the need for exploring sub-optimal allocations. We formulate this problem as a generic linear bandit problem, and analyze it first in a stochastic setting where radio conditions are driven by a stationary stochastic process, and then in an adversarial setting where radio conditions can evolve arbitrarily. We provide new algorithms in both settings and derive upper bounds on their regrets.Comment: 21 page

arXiv.org e-Print Archive

CiteSeerX

Stochastic Online Shortest Path Routing: The Value of Feedback

Author: Combes Richard
Johansson Mikael
Proutiere Alexandre
Talebi M. Sadegh
Zou Zhenhua
Publication venue
Publication date: 01/01/2017
Field of study

This paper studies online shortest path routing over multi-hop networks. Link costs or delays are time-varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact. Three algorithms, with a trade-off between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments.Comment: 18 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL-Rennes 1

Combinatorial Bandits Revisited

Author: Alexandre Proutiere
M Sadegh Talebi
Marc Lelarge
Richard Combes
Publication venue
Publication date: 05/03/2020
Field of study

Abstract This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose COMBEXP, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems

CiteSeerX